Perceptual Flicker Visibility Prediction Model

نویسندگان

  • Lark Kwon Choi
  • Alan C. Bovik
چکیده

The mere presence of spatiotemporal distortions in digital videos does not have to imply quality degradation since distortion visibility can be strongly reduced by the perceptual phenomenon of visual masking. Flicker is a particularly annoying occurrence, which can arise from a variety of distortion processes. Yet flicker can also be suppressed by masking. We propose a perceptual flicker visibility prediction model which is based on a recently discovered visual change silencing phenomenon. The proposed model predicts flicker visibility on both static and moving regions without any need for content-dependent thresholds. Using a simple model of cortical responses to video flicker, an energy model of motion perception, and a divisive normalization stage, the system captures the local spectral signatures of flicker distortions and predicts perceptual flicker visibility. The model not only predicts silenced flicker distortions in the presence of motion, but also provides a pixel-wise flicker visibility index. Results show that the predicted flicker visibility model correlates well with human percepts of flicker distortions tested on the LIVE Flicker Video Database and is highly competitive with current flicker visibility prediction methods. Introduction Digital videos are increasingly pervasive due to the rapid proliferation of video streaming services, video sharing in social networks, and the global increase of mobile video traffic [1], [2]. The dramatic growth of digital videos and user demand for highquality video have necessitated the development of precise automatic perceptual video quality assessment (VQA) tools to help provide satisfactory levels of Quality of Experience (QoE) to the end user [3]. To achieve optimal video quality under limited bandwidth and power consumption, video coding technologies commonly employ lossy coding schemes, which cause compression artifacts that can lead to degradation of perceptual video quality [4]. In addition, compressed videos can suffer from transmission distortions, including packet losses and playback interruptions triggered by channel throughput fluctuations. Since humans are generally the ultimate arbiter of the received videos, predicting and reducing perceptual visual distortions of compressed digital videos is of great interest [5]. Researchers have performed a large number of subjective studies to understand essential factors that influence video quality by analyzing compression artifacts or transmission distortions of the compressed videos [6], by investigating dynamic time varying distortions [7], and by probing the time varying subjective quality of rate adaptive videos [8]. Substantial progress has also been made toward understanding and modeling low-level visual processes in the vision system extending from the retina to primary visual cortex and extra-striate cortex [9]. As a result, perceptual models of disruptions to natural scene statistics [10] and of visual masking [11] have been widely applied to predict perceptual visual quality. Spatial distortions are effectively predicted by VQA algorithms such as SSIM [12], VQM [13], MOVIE [14], STRRED [15], and Video-BLIINDS [16]. Spatial masking is well-modeled in modern perceptual image and video quality assessment tools, video compression, and watermarking. However, temporal visual masking is not well-modeled although one type of it has been observed to occur near scene changes [17], and been used in the context of early video compression methods [18-20]. Among temporal distortions, flicker distortion is particularly challenging to predict and often occurs on low bit-rate compressed videos. Flicker distortion is (spatially local or global) temporal fluctuation of luminance or chrominance in videos. Local flicker occurs mainly due to coarse quantization, varying prediction modes, mismatching of inter-frame blocks, improper deinterlacing, and dynamic rate changes caused by adaptive rate control methods [21-25]. Mosquito noise and stationary area fluctuations are also often categorized under local flicker. Mosquito noise is a joint effect of object motion and time-varying spatial artifacts such as ringing and motion prediction errors near high-contrast sharp edges or moving objects, while stationary area fluctuations result from different types of prediction, quantization levels, or a combination of these factors on static regions [4], [21]. Current flicker visibility prediction methods that operate on a compressed video measure the Sum of Squared Differences (SSD) between the block difference of an original video and the block difference of a compressed video. The block difference is obtained between successive frames on macroblocks. When the sum of squared block differences on an original video falls below a threshold, a static region is indicated [22]. The ratio between luminance level fluctuation in the compressed video and that in the original video has also been used [23]. To improve the prediction of flicker-prone blocks, a normalized fraction model was proposed [24], where the difference of SSDs between the original and compressed block differences is divided by the sum of the SSDs. These methods have the virtue of simplicity, but the resulting flicker prediction performance is limited and content-dependent. Another method included the influence of motion on flicker prediction, where motion compensation was applied prior to SSD calculation [25]. The mean absolute discrete temporal derivatives of the average DC coefficient of DCT blocks was used to measure sudden local changes (flicker) in a VQA model [16]. Current flicker prediction methods are limited to block-wise accuracy. Further, human visual system (HVS)-based perceptual flicker visibility e.g., considering temporal visual masking, has not yet been extensively studied. Recently, Suchow and Alvarez [26] demonstrated a striking “motion silencing” illusion, in the form of a powerful temporal visual masking phenomenon called change silencing, where the salient temporal changes of objects in luminance, color, size, and shape appear to cease in the presence of large object motions. This motion-induced failure to detect change not only suggests a tight coupling between motion and object appearance, but also reveals that commonly occurring temporal distortions such as flicker may be dramatically suppressed by the presence of motion. To understand the mechanism of motion silencing, physiologically plausible explanations have been proposed [26-29]. However, since the effect has only been studied on highly synthetic stimuli such as moving dots, we performed a series of human subjective studies on naturalistic videos, where flicker visibility is observed to be strongly reduced by large coherent object motions [30-33]. A consistent physiological and computational model that detects motion silencing might be useful to probe perceptual flicker visibility on compressed videos. In this paper, we propose a new perceptual flicker visibility prediction model based on motion silencing. The new perceptual flicker visibility prediction model is a significant step towards improving the performance of VQA models by making possible a model of temporal masking of temporal distortions. The new model measures the bandpass filter responses to a reference video and a corresponding flicker video using a localized multiscale 3D space time Gabor filter bank [34], [35], a spatiotemporal energy model of motion perception [36], and a divisive normalization model of nonlinear gain control in primary visual cortex [37]. We observed that flicker produces locally separated spectral signatures that almost lie along the same orientation as the motion tuned plane of the reference video but at a distance. The captured V1 responses for the flicker induced spectral signatures generally decreased when object speeds increase. Next, we measured the local difference of bandpass responses at each space-time frequency orientation and defined the sum of the magnitude responses as a perceptual flicker visibility index. The proposed model predicts temporal masking effects on flicker distortions and thereby shows highly competitive performance against previous flicker visibility prediction methods. Background: Motion Perception Motion perception is the process of inferring the speed and direction of moving objects. Since motion perception is important for understanding flicker distortions in videos, we model motion perception in the frequency domain. Watson and Ahumada [38] proposed a model of how humans sense the velocity of moving images, where the motion-sensing elements appear locally tuned to specific spatiotemporal frequencies. Assuming that complex motions of video without any scene changes can be constructed by piecing together spatiotemporally localized image patches undergoing translation, we can model the local spectral signatures of videos when an image patch moves [38]. An arbitrary space-time image patch can be represented by a function a(x, y, t) at each point x, y, and time t, and its Fourier transform by A(u, v, w) where u, v, and w are spatial and temporal frequency variables corresponding to x, y and t, respectively. Let λ and φ denote the image patch horizontal and vertical velocity components. When an image patch translates at constant velocity [λ, φ], the moving video sequence becomes b(x, y, t) = a(x – λt, y – φt, t). The spectrum of a stationary image patch lies on the u, v plane, while the Fourier transform shears into an oblique plane through the origin when the image patch moves. The orientation of this plane indicates the speed and direction of motion. Prediction of Perceptual Flicker Visibility Linear Decomposition Natural environments are inherently multi-scale and multiorientation, and objects move multi-directionally at diverse speeds. To efficiently encode visual signals, the vision system decomposes (a) (b) (c) Figure 1. Gabor filter bank in the frequency domain. (a) Geometry of the Gabor filter bank. (b) A slice of the Gabor filter bank along the plane of zero temporal frequency. (c) A slice of the Gabor filter bank along the plane of zero vertical spatial frequency. the visual world over scales, orientations, directions, and speeds. Cortical neurons in Area V1 are selective for spatiotemporal frequency and orientation, while neurons in Area MT are selective for the velocity of visual stimuli [37], [39]. Since the responses of simple cells in Area V1 are well-modeled as linear and bandpass [34], [35], linear decompositions are widely used to model the spatiotemporal responses to video signals [14], [40]. The receptive field profiles of V1 simple cells are well modeled by Gabor filters [34], [35]. Hence, we used a bank of spatiotemporally separable Gabor filters to model the responses of V1 simple cells to videos. A 3D spatiotemporal separable Gabor filter h(x) is the product of a complex exponential with a Gaussian envelope:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Motion silencing of flicker distortions on naturalistic videos

We study the influence of motion on the visibility of flicker distortions in naturalistic videos. A series of human subjective studies were executed to understand how motion silences the visibility of flicker distortions as a function of object motion, flicker frequency, and video quality. We found that flicker visibility is strongly reduced when the speed of coherent motion is large, and the e...

متن کامل

Toward Perceptual Metrics for Video Watermark Evaluation

Assessing and comparing the performance of watermarking algorithms is difficult. The visibility of the watermark is an important aspect in this process. In this paper, we propose two metrics for evaluating the visual impact of video watermarks. Based on several different watermarking algorithms and video sequences, we identify the most prominent impairments as spatial noise and temporal flicker...

متن کامل

Referenceless perceptual fog density prediction model

We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and “fog aware” statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, without training on human-rated judgments, and without dependency on salient objects such as lane ...

متن کامل

Temporal presentation protocols in stereoscopic displays: Flicker visibility, perceived motion, and perceived depth.

Most stereoscopic displays rely on field-sequential presentation to present different images to the left and right eyes. With sequential presentation, images are delivered to each eye in alternation with dark intervals, and each eye receives its images in counter phase with the other eye. This type of presentation can exacerbate image artifacts including flicker, and the appearance of unsmooth ...

متن کامل

A Dissociation Between Brain Activity and Perception: Chromatically Opponent Cortical Neurons Signal Chromatic Flicker that is not Perceived

When two isoluminant colors alternate at frequencies > 10 Hz, we perceive only one fused color with a minimal sensation of brightness flicker. In spite of the perception of color fusion, color opponent (CO) cells at early stages of the visual pathway are known to respond to chromatic flicker at frequencies far exceeding the perceptual fusion frequency. To explain color fusion, several groups ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016